WIP: experiment with first class dim objects #1517

aseyboldt · 2025-07-02T16:30:41Z

Named Dimensions Refactor: Objects Instead of Strings

I'm still working on this, but thought it might be helpful to share what I have so far...

The Key Change

In this version of named-dims, we use objects to represent dimensions instead of plain strings. This allows us to ensure that array axes with shared dimensions are always compatible, eliminating shape errors as long as you stay in dim-land.

The Two Parts of a Dimension

We can think of a dimension as having two components:

Size (or length) - This might be known statically or only at runtime.
Identity - Just because two tensors happen to have the same length doesn't mean they're compatible. The identity decides if two tensors can be combined. Crucially, if two tensors share the same identity, they must always have the same length.

This is similar to vector spaces in math: you can't add a 3D velocity vector to a 3D position vector, even though both are 3D. The mathematical operations care about the meaning of the dimensions, not just their size.

Implementation: Types and Variables

We implement this split using PyTensor's type system:

Each dimension has a unique PyTensor Type (an instance of DimType) for its identity
When we need to work with the dimension (like creating tensors), we also need its size, represented as a DimVariable.

# Create a new dimension
>>> foo = px.dim("foo")
>>> foo.type
BasicDim(foo, uuid=?)

The object foo itself is a DimVariable - at runtime, this represents the size of dimension foo.

Creating Tensors with Dimensions

>>> x = px.basic.zeros(foo, name="x")
>>> x.type
XTensorType(float64, shape=(None,), dims=(BasicDim(foo, uuid=?),))

The tensor x remembers the identity of dimension foo in its type. It doesn't need to store the DimVariable separately because it can recreate one from the tensor itself when needed:

>>> x.dims[0].dprint();
FromTensor{dim_type=BasicDim(foo, uuid=?)} [id A] 'foo'
 └─ XTensorFromTensor [id B] 'x'
    ├─ Alloc [id C]
    │  ├─ 0.0 [id D]
    │  └─ TensorFromScalar [id E]
    │     └─ Length [id F]
    │        └─ foo [id G]
    └─ foo [id G]

Ensuring Dimension Uniqueness

To prevent shape errors, we need to avoid having two unrelated DimVariables with the same type. Every call to px.dim() creates a truly unique dimension:

>>> foo = px.dim("foo")
>>> foo2 = px.dim("foo")  # Same name, different dimension!
>>> foo.type == foo2.type
False

We use random UUIDs in the type to guarantee uniqueness.

The size Invariant

For consistent graphs, we maintain this invariant: "During function execution: If two DimVariables have the same type, their runtime values are also the same".

This works because DimVariables can only be created in three ways:

Root variables - px.dim() creates a new unique type, so it can't share its type with anything else.
From tensors - We must have had an existing DimVariable to create the tensor, so length is consistent, or the tensor was user provided. For that case we must a a consistency check of the user input.
Derived from other DimVariables - If inputs are consistent, outputs are too

The main challenge is user input validation - we need to verify that input tensors match their declared dimensions before execution.

Small sidenote:

Unfortunately there is a way users can create two unrelated DimVariable objects with the same type:

foo = px.dim("foo")
foo2 = foo.type()

But if we assume that foo.type() is a private function (or maybe we can override the call method to make that clearer), that shouldn't be too much of a problem. We just have to make sure we don't do it ourselves when we add new Ops...

Derived Dimensions

I think we can do a lot of cool things with derived dimensions, but I'm still working on those.

One simple example that already works is a ClonedDim. We don't allow duplicate dimensions in one tensor to simplify indexing and xarray compatibility, but in many cases a user might still need the essentially same dim in a tensor twice (for instance for a covariance matrix). We can use a cloned dimension for that. A cloned dimension always has the same length as its base dimension, but it has a new identity. So for instance:

>>> foo = px.dim("foo")
>>> # This fails
>>> px.xtensor("x", dims=[foo, foo])
ValueError...
>>> foo2 = foo.clone_dim()
>>> x = px.xtensor("x", dims=[foo, foo2])

@OriolAbril @ricardoV94

📚 Documentation preview 📚: https://pytensor--1517.org.readthedocs.build/en/1517/

pytensor/xtensor/type.py

ricardoV94 · 2025-07-04T08:41:53Z

>>> foo = px.dim("foo")
>>> foo2 = px.dim("foo")  # Same name, different dimension!
>>> foo.type == foo2.type
False

Is this still true? I would think when a user is working they may specify dims by label, so they say x.sum("city") and under the hood this would work fine, because we can convert the user string into a BasicDim that matches in equality anyway?

Also thinking user can do x.rename({SliceDim("time"): "time*"}) whitout having to worry exactly how the SliceDim("time")" is created or where to retrieve it from.

aseyboldt · 2025-07-04T10:00:14Z

In the current code that is still true.
I think we can get away with allowing those to be equal (if we do some extra input validation, you shouldn't be allowed to pass to different values for the length of the same dimension). I'm not sure if we should want them to be equal however. In pytensor, we also don't assume that two tensor variables are the same, just because they happen to have the same name.

ricardoV94 · 2025-07-04T12:05:20Z

We can treat dims as symbols (not sure if that's the term), since in xarray dataset you can't have duplicate dims having different meaning either?

But it's a choice not a requirement

review-notebook-app · 2025-07-04T15:47:41Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ricardoV94 · 2025-07-14T15:10:05Z

pytensor/xtensor/type.py

+        raise NotImplementedError("Subclass did not implent dim broadcasting")
+
+
+class BasicDim(DimType):


Nit: I would perhaps split dim_type / var_type into separate files, this one is already pretty long as is

ricardoV94 · 2025-07-14T15:17:04Z

pytensor/xtensor/dims.py

+    return Product()(*dims, name=name)
+
+
+def rebase_dim(dim: DimVariable | DimType, *tensors: XTensorVariable) -> DimVariable:


What is the purpose of rebase_dim?

Create a dim from an existing xtensor / get the length at runtime?

That's a helper for rewrites to avoid infinite loops:

For instance in Elemwise:

@register_lower_xtensor @node_rewriter(tracks=[XElemwise]) def lower_elemwise(fgraph, node): assert len(node.outputs) == 1 out_dims = node.outputs[0].dims out_dims = [rebase_dim(dim, *node.inputs) for dim in out_dims] # Convert input XTensors to Tensors and align batch dimensions tensor_inputs = [lower_aligned(inp, out_dims) for inp in node.inputs] tensor_outs = Elemwise(scalar_op=node.op.scalar_op)( *tensor_inputs, return_list=True ) # Convert output Tensors to XTensors new_outs = [ xtensor_from_tensor(tensor_out, dims=out_dims, check=False) for tensor_out in tensor_outs ] return new_outs

The final XTensorFromTensor op takes the dim variables as inputs. And if we were to use node.outputs[0].dims for those, the returned graph would still return a reference to the XElemwise we want to replace, because those dims are variables that use DimFromTensor(XElemwise) to get the a reference to the dimension length.

ricardoV94 · 2025-07-14T15:20:41Z

Looking good. Do you already have any op that generates its own dims working?

ricardoV94 · 2025-07-14T15:25:26Z

pytensor/xtensor/reduction.py

@@ -96,7 +110,7 @@ def var(x, dim: REDUCE_DIM, *, ddof: int = 0):
    x = as_xtensor(x)
    x_mean = mean(x, dim)
    n = _infer_reduced_size(x, x_mean)
-    return square(x - x_mean) / (n - ddof)
+    return square(x - x_mean).mean(dim) / (n - ddof)


I just fixed this. Sanity check for myself, it should be sum right? We then use n - ddof on it

Suggested change

return square(x - x_mean).mean(dim) / (n - ddof)

return square(x - x_mean).sum(dim) / (n - ddof)

Oh, yes, I missed that we divide 🤦

aseyboldt · 2025-07-14T15:56:16Z

I'm currently working on shape.py with stack, unstack etc. Should be coming soon. That and indexing are the majority of tests that are still failing.

experiment with first class dim objects

8dfd588

ricardoV94 reviewed Jul 2, 2025

View reviewed changes

pytensor/xtensor/type.py Show resolved Hide resolved

move dim hierarchy directly to DimType

59b7b50

aseyboldt added 2 commits July 4, 2025 17:46

some renames and rewrites

2af100d

add temporary doc notebook (to be removed)

ebc9735

aseyboldt added 3 commits July 14, 2025 13:34

wip work on tests

dd4d1ad

merge with master

2c0e780

work on tests

1ce338c

ricardoV94 reviewed Jul 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: experiment with first class dim objects #1517

WIP: experiment with first class dim objects #1517

aseyboldt commented Jul 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

ricardoV94 commented Jul 4, 2025 •

edited

Loading

Uh oh!

aseyboldt commented Jul 4, 2025

Uh oh!

ricardoV94 commented Jul 4, 2025 •

edited

Loading

Uh oh!

review-notebook-app bot commented Jul 4, 2025

Uh oh!

ricardoV94 Jul 14, 2025

Uh oh!

ricardoV94 Jul 14, 2025

Uh oh!

ricardoV94 Jul 14, 2025

Uh oh!

aseyboldt Jul 14, 2025

Uh oh!

ricardoV94 commented Jul 14, 2025

Uh oh!

ricardoV94 Jul 14, 2025

Uh oh!

aseyboldt Jul 14, 2025

Uh oh!

aseyboldt commented Jul 14, 2025

Uh oh!

Uh oh!

		raise NotImplementedError("Subclass did not implent dim broadcasting")


		class BasicDim(DimType):

		return Product()(*dims, name=name)


		def rebase_dim(dim: DimVariable \| DimType, *tensors: XTensorVariable) -> DimVariable:

	return square(x - x_mean).mean(dim) / (n - ddof)
	return square(x - x_mean).sum(dim) / (n - ddof)

WIP: experiment with first class dim objects #1517

Are you sure you want to change the base?

WIP: experiment with first class dim objects #1517

Conversation

aseyboldt commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Named Dimensions Refactor: Objects Instead of Strings

The Key Change

The Two Parts of a Dimension

Implementation: Types and Variables

Creating Tensors with Dimensions

Ensuring Dimension Uniqueness

The size Invariant

Derived Dimensions

Uh oh!

Uh oh!

ricardoV94 commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aseyboldt commented Jul 4, 2025

Uh oh!

ricardoV94 commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Jul 4, 2025

Uh oh!

ricardoV94 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

aseyboldt Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

ricardoV94 commented Jul 14, 2025

Uh oh!

ricardoV94 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

aseyboldt Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

aseyboldt commented Jul 14, 2025

Uh oh!

Uh oh!

aseyboldt commented Jul 2, 2025 •

edited

Loading

ricardoV94 commented Jul 4, 2025 •

edited

Loading

ricardoV94 commented Jul 4, 2025 •

edited

Loading